MatLM: a Matrix Formulation for Probabilistic Language Models

نویسندگان

  • Yanshan Wang
  • Hongfang Liu
چکیده

Probabilistic language models are widely used in Information Retrieval (IR) to rank documents by the probability that they generate the query. However, the implementation of the probabilistic representations with programming languages that favor matrix calculations is challenging. In this paper, we utilize matrix representations to reformulate the probabilistic language models. The matrix representation is a superstructure for the probabilistic language models to organize the calculated probabilities and a potential formalism for standardization of language models and for further mathematical analysis. It facilitates implementations by matrix friendly programming languages. In this paper, we consider the matrix formulation of conventional language model with Dirichlet smoothing, and two language models based on Latent Dirichlet Allocation (LDA), i.e., LBDM and LDI. We release a Java software package– MatLM–implementing the proposed models. Code is available at: https://github.com/yanshanwang/JGibbLDA-v.1.0-MatLM.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning Distributed Representations for Statistical Language Modelling and Collaborative Filtering

Learning Distributed Representations for Statistical Language Modelling and Collaborative Filtering Andriy Mnih Doctor of Philosophy Graduate Department of Computer Science University of Toronto 2010 With the increasing availability of large datasets machine learning techniques are becoming an increasingly attractive alternative to expert-designed approaches to solving complex problems in domai...

متن کامل

Implicational Scaling of Reading Comprehension Construct: Is it Deterministic or Probabilistic?

In English as a Second Language Teaching and Testing situations, it is common to infer about learners’ reading ability based on his or her total score on a reading test. This assumes the unidimensional and reproducible nature of reading items. However, few researches have been conducted to probe the issue through psychometric analyses. In the present study, the IELTS exemplar module C (1994) wa...

متن کامل

Preparation of Sustained-Release Matrix Tablets of Aspirin with Ethylcellulose, Eudragit RS100 and Eudragit S100 and Studying the Release Profiles and their Sensitivity to Tablet Hardness

A sustained-release tablet formulation should ideally have a proper release profile insensitive to moderate changes in tablet hardness that is usually encountered in manufacturing. In this study, matrix aspirin (acetylsalicylic acid) tablets with ethylcellulose (EC), Eudragit RS100 (RS), and Eudragit S100 (S) were prepared by direct compression. The release behaviors were then studied in two co...

متن کامل

Random Matrix Approach: Toward Probabilistic Formulation of the Manipulator Jacobian

In this paper, we formulate the manipulator Jacobian matrix in a probabilistic framework based on the random matrix theory (RMT). Due to the limited available information on the system fluctuations, the parametric approaches often prove to be inadequate to appropriately characterize the uncertainty. To overcome this difficulty, we develop two RMTbased probabilistic models for the Jacobian matri...

متن کامل

Preparation of Sustained-Release Matrix Tablets of Aspirin with Ethylcellulose, Eudragit RS100 and Eudragit S100 and Studying the Release Profiles and their Sensitivity to Tablet Hardness

A sustained-release tablet formulation should ideally have a proper release profile insensitive to moderate changes in tablet hardness that is usually encountered in manufacturing. In this study, matrix aspirin (acetylsalicylic acid) tablets with ethylcellulose (EC), Eudragit RS100 (RS), and Eudragit S100 (S) were prepared by direct compression. The release behaviors were then studied in two co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1610.00735  شماره 

صفحات  -

تاریخ انتشار 2016